Serveur d'exploration Santé et pratique musicale

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Zipf's law in short-time timbral codings of speech, music, and environmental sound signals.

Identifieur interne : 001253 ( Main/Exploration ); précédent : 001252; suivant : 001254

Zipf's law in short-time timbral codings of speech, music, and environmental sound signals.

Auteurs : Martín Haro [Espagne] ; Joan Serrà ; Perfecto Herrera ; Alvaro Corral

Source :

RBID : pubmed:22479497

Descripteurs français

English descriptors

Abstract

Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.

DOI: 10.1371/journal.pone.0033993
PubMed: 22479497
PubMed Central: PMC3315504


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Zipf's law in short-time timbral codings of speech, music, and environmental sound signals.</title>
<author>
<name sortKey="Haro, Martin" sort="Haro, Martin" uniqKey="Haro M" first="Martín" last="Haro">Martín Haro</name>
<affiliation wicri:level="4">
<nlm:affiliation>Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain. martin.haro@upf.edu</nlm:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>Music Technology Group, Universitat Pompeu Fabra, Barcelona</wicri:regionArea>
<placeName>
<settlement type="city">Barcelone</settlement>
<region nuts="2" type="region">Catalogne</region>
</placeName>
<orgName type="university">Université Pompeu Fabra</orgName>
</affiliation>
</author>
<author>
<name sortKey="Serra, Joan" sort="Serra, Joan" uniqKey="Serra J" first="Joan" last="Serrà">Joan Serrà</name>
</author>
<author>
<name sortKey="Herrera, Perfecto" sort="Herrera, Perfecto" uniqKey="Herrera P" first="Perfecto" last="Herrera">Perfecto Herrera</name>
</author>
<author>
<name sortKey="Corral, Alvaro" sort="Corral, Alvaro" uniqKey="Corral A" first="Alvaro" last="Corral">Alvaro Corral</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2012">2012</date>
<idno type="RBID">pubmed:22479497</idno>
<idno type="pmid">22479497</idno>
<idno type="doi">10.1371/journal.pone.0033993</idno>
<idno type="pmc">PMC3315504</idno>
<idno type="wicri:Area/Main/Corpus">001309</idno>
<idno type="wicri:explorRef" wicri:stream="Main" wicri:step="Corpus" wicri:corpus="PubMed">001309</idno>
<idno type="wicri:Area/Main/Curation">001309</idno>
<idno type="wicri:explorRef" wicri:stream="Main" wicri:step="Curation">001309</idno>
<idno type="wicri:Area/Main/Exploration">001309</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Zipf's law in short-time timbral codings of speech, music, and environmental sound signals.</title>
<author>
<name sortKey="Haro, Martin" sort="Haro, Martin" uniqKey="Haro M" first="Martín" last="Haro">Martín Haro</name>
<affiliation wicri:level="4">
<nlm:affiliation>Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain. martin.haro@upf.edu</nlm:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>Music Technology Group, Universitat Pompeu Fabra, Barcelona</wicri:regionArea>
<placeName>
<settlement type="city">Barcelone</settlement>
<region nuts="2" type="region">Catalogne</region>
</placeName>
<orgName type="university">Université Pompeu Fabra</orgName>
</affiliation>
</author>
<author>
<name sortKey="Serra, Joan" sort="Serra, Joan" uniqKey="Serra J" first="Joan" last="Serrà">Joan Serrà</name>
</author>
<author>
<name sortKey="Herrera, Perfecto" sort="Herrera, Perfecto" uniqKey="Herrera P" first="Perfecto" last="Herrera">Perfecto Herrera</name>
</author>
<author>
<name sortKey="Corral, Alvaro" sort="Corral, Alvaro" uniqKey="Corral A" first="Alvaro" last="Corral">Alvaro Corral</name>
</author>
</analytic>
<series>
<title level="j">PloS one</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2012" type="published">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms (MeSH)</term>
<term>Models, Statistical (MeSH)</term>
<term>Music (MeSH)</term>
<term>Pitch Discrimination (MeSH)</term>
<term>Sound (MeSH)</term>
<term>Sound Spectrography (MeSH)</term>
<term>Speech Acoustics (MeSH)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Acoustique de la voix (MeSH)</term>
<term>Algorithmes (MeSH)</term>
<term>Discrimination de la hauteur tonale (MeSH)</term>
<term>Modèles statistiques (MeSH)</term>
<term>Musique (MeSH)</term>
<term>Son (physique) (MeSH)</term>
<term>Spectrographie sonore (MeSH)</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Models, Statistical</term>
<term>Music</term>
<term>Pitch Discrimination</term>
<term>Sound</term>
<term>Sound Spectrography</term>
<term>Speech Acoustics</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Acoustique de la voix</term>
<term>Algorithmes</term>
<term>Discrimination de la hauteur tonale</term>
<term>Modèles statistiques</term>
<term>Musique</term>
<term>Son (physique)</term>
<term>Spectrographie sonore</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">22479497</PMID>
<DateCompleted>
<Year>2012</Year>
<Month>11</Month>
<Day>19</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1932-6203</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>7</Volume>
<Issue>3</Issue>
<PubDate>
<Year>2012</Year>
</PubDate>
</JournalIssue>
<Title>PloS one</Title>
<ISOAbbreviation>PLoS One</ISOAbbreviation>
</Journal>
<ArticleTitle>Zipf's law in short-time timbral codings of speech, music, and environmental sound signals.</ArticleTitle>
<Pagination>
<MedlinePgn>e33993</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1371/journal.pone.0033993</ELocationID>
<Abstract>
<AbstractText>Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Haro</LastName>
<ForeName>Martín</ForeName>
<Initials>M</Initials>
<AffiliationInfo>
<Affiliation>Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain. martin.haro@upf.edu</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Serrà</LastName>
<ForeName>Joan</ForeName>
<Initials>J</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Herrera</LastName>
<ForeName>Perfecto</ForeName>
<Initials>P</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Corral</LastName>
<ForeName>Alvaro</ForeName>
<Initials>A</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2012</Year>
<Month>03</Month>
<Day>29</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>PLoS One</MedlineTA>
<NlmUniqueID>101285081</NlmUniqueID>
<ISSNLinking>1932-6203</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D015233" MajorTopicYN="Y">Models, Statistical</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D009146" MajorTopicYN="Y">Music</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D010897" MajorTopicYN="Y">Pitch Discrimination</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D013016" MajorTopicYN="Y">Sound</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D013018" MajorTopicYN="N">Sound Spectrography</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D013061" MajorTopicYN="Y">Speech Acoustics</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2011</Year>
<Month>10</Month>
<Day>27</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2012</Year>
<Month>02</Month>
<Day>22</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2012</Year>
<Month>4</Month>
<Day>6</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2012</Year>
<Month>4</Month>
<Day>6</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2012</Year>
<Month>12</Month>
<Day>10</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">22479497</ArticleId>
<ArticleId IdType="doi">10.1371/journal.pone.0033993</ArticleId>
<ArticleId IdType="pii">PONE-D-11-21323</ArticleId>
<ArticleId IdType="pmc">PMC3315504</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Nature. 1991 Jul 18;352(6332):236-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">1857418</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2007 Jan 30;104(5):1461-4</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17244704</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Cognition. 1999 Jan 1;69(3):B17-24</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">10193053</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1996 Feb;53(2):1465-1469</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9964408</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2001 Mar 8;410(6825):242-50</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11258379</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2008 Jun 19;453(7198):988-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18563138</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):788-91</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12540826</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Phys Rev E Stat Nonlin Soft Matter Phys. 2011 Jun;83(6 Pt 2):066103</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21797437</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1982 May;79(10):3380-3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16593191</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1990 Feb 1;87(3):938-41</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11607061</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1991 Apr 15;88(8):3507-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11607178</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Acoust Soc Am. 2007 Aug;122(2):881-91</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17672638</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Phys Rev Lett. 2003 Feb 28;90(8):088104</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12633465</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2011;6(12):e28317</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22194825</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2011;6(10):e27024</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22046436</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2010 Sep 14;107(37):16023-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20805513</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2002 Nov 22;298(5598):1569-79</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12446899</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Neurosci Lett. 2008 May 2;436(1):85-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18359163</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2010;5(3):e9411</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20231884</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2009 Jul 28;106(30):12251-4</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19597146</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nature. 2005 May 12;435(7039):207-11</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15889093</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Phys Rev E Stat Nonlin Soft Matter Phys. 2010 Jul;82(1 Pt 1):011102</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20866560</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>Espagne</li>
</country>
<region>
<li>Catalogne</li>
</region>
<settlement>
<li>Barcelone</li>
</settlement>
<orgName>
<li>Université Pompeu Fabra</li>
</orgName>
</list>
<tree>
<noCountry>
<name sortKey="Corral, Alvaro" sort="Corral, Alvaro" uniqKey="Corral A" first="Alvaro" last="Corral">Alvaro Corral</name>
<name sortKey="Herrera, Perfecto" sort="Herrera, Perfecto" uniqKey="Herrera P" first="Perfecto" last="Herrera">Perfecto Herrera</name>
<name sortKey="Serra, Joan" sort="Serra, Joan" uniqKey="Serra J" first="Joan" last="Serrà">Joan Serrà</name>
</noCountry>
<country name="Espagne">
<region name="Catalogne">
<name sortKey="Haro, Martin" sort="Haro, Martin" uniqKey="Haro M" first="Martín" last="Haro">Martín Haro</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SanteMusiqueV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001253 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001253 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    SanteMusiqueV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:22479497
   |texte=   Zipf's law in short-time timbral codings of speech, music, and environmental sound signals.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:22479497" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a SanteMusiqueV1 

Wicri

This area was generated with Dilib version V0.6.38.
Data generation: Mon Mar 8 15:23:44 2021. Site generation: Mon Mar 8 15:23:58 2021